35 research outputs found
Reimagine BiSeNet for Real-Time Domain Adaptation in Semantic Segmentation
Semantic segmentation models have reached remarkable performance across
various tasks. However, this performance is achieved with extremely large
models, using powerful computational resources and without considering training
and inference time. Real-world applications, on the other hand, necessitate
models with minimal memory demands, efficient inference speed, and executable
with low-resources embedded devices, such as self-driving vehicles. In this
paper, we look at the challenge of real-time semantic segmentation across
domains, and we train a model to act appropriately on real-world data even
though it was trained on a synthetic realm. We employ a new lightweight and
shallow discriminator that was specifically created for this purpose. To the
best of our knowledge, we are the first to present a real-time adversarial
approach for assessing the domain adaption problem in semantic segmentation. We
tested our framework in the two standard protocol: GTA5 to Cityscapes and
SYNTHIA to Cityscapes. Code is available at:
https://github.com/taveraantonio/RTDA.Comment: Accepted at I-RIM 3D 202
Viewpoint Invariant Dense Matching for Visual Geolocalization
In this paper we propose a novel method for image matching based on dense local features and tailored for visual geolocalization. Dense local features matching is robust against changes in illumination and occlusions, but not against viewpoint shifts which are a fundamental aspect of geolocalization. Our method, called GeoWarp, directly embeds invariance to viewpoint shifts in the process of extracting dense features. This is achieved via a trainable module which learns from the data an invariance that is meaningful for the task of recognizing places. We also devise a new self-supervised loss and two new weakly supervised losses to train this module using only unlabeled data and weak labels. GeoWarp is implemented efficiently as a re-ranking method that can be easily embedded into pre-existing visual geolocalization pipelines. Experimental validation on standard geolocalization benchmarks demonstrates that GeoWarp boosts the accuracy of state-of-the-art retrieval architectures. The code and trained models are available at https://github.com/gmberton/geo_war
IDDA: a large-scale multi-domain dataset for autonomous driving
Semantic segmentation is key in autonomous driving. Using deep visual
learning architectures is not trivial in this context, because of the
challenges in creating suitable large scale annotated datasets. This issue has
been traditionally circumvented through the use of synthetic datasets, that
have become a popular resource in this field. They have been released with the
need to develop semantic segmentation algorithms able to close the visual
domain shift between the training and test data. Although exacerbated by the
use of artificial data, the problem is extremely relevant in this field even
when training on real data. Indeed, weather conditions, viewpoint changes and
variations in the city appearances can vary considerably from car to car, and
even at test time for a single, specific vehicle. How to deal with domain
adaptation in semantic segmentation, and how to leverage effectively several
different data distributions (source domains) are important research questions
in this field. To support work in this direction, this paper contributes a new
large scale, synthetic dataset for semantic segmentation with more than 100
different source visual domains. The dataset has been created to explicitly
address the challenges of domain shift between training and test data in
various weather and view point conditions, in seven different city types.
Extensive benchmark experiments assess the dataset, showcasing open challenges
for the current state of the art. The dataset will be available at:
https://idda-dataset.github.io/home/ .Comment: Accepted at IROS 2020 and RA-L. Download at:
https://idda-dataset.github.io/home
Augmentation Invariance and Adaptive Sampling in Semantic Segmentation of Agricultural Aerial Images
EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition
Visual Place Recognition is a task that aims to predict the place of an image
(called query) based solely on its visual features. This is typically done
through image retrieval, where the query is matched to the most similar images
from a large database of geotagged photos, using learned global descriptors. A
major challenge in this task is recognizing places seen from different
viewpoints. To overcome this limitation, we propose a new method, called
EigenPlaces, to train our neural network on images from different point of
views, which embeds viewpoint robustness into the learned global descriptors.
The underlying idea is to cluster the training data so as to explicitly present
the model with different views of the same points of interest. The selection of
this points of interest is done without the need for extra supervision. We then
present experiments on the most comprehensive set of datasets in literature,
finding that EigenPlaces is able to outperform previous state of the art on the
majority of datasets, while requiring 60\% less GPU memory for training and
using 50\% smaller descriptors. The code and trained models for EigenPlaces are
available at {\small{\url{https://github.com/gmberton/EigenPlaces}}}, while
results with any other baseline can be computed with the codebase at
{\small{\url{https://github.com/gmberton/auto_VPR}}}.Comment: ICCV 202
Augmentation Invariance and Adaptive Sampling in Semantic Segmentation of Agricultural Aerial Images
In this paper, we investigate the problem of Semantic Segmentation for
agricultural aerial imagery. We observe that the existing methods used for this
task are designed without considering two characteristics of the aerial data:
(i) the top-down perspective implies that the model cannot rely on a fixed
semantic structure of the scene, because the same scene may be experienced with
different rotations of the sensor; (ii) there can be a strong imbalance in the
distribution of semantic classes because the relevant objects of the scene may
appear at extremely different scales (e.g., a field of crops and a small
vehicle). We propose a solution to these problems based on two ideas: (i) we
use together a set of suitable augmentation and a consistency loss to guide the
model to learn semantic representations that are invariant to the photometric
and geometric shifts typical of the top-down perspective (Augmentation
Invariance); (ii) we use a sampling method (Adaptive Sampling) that selects the
training images based on a measure of pixel-wise distribution of classes and
actual network confidence. With an extensive set of experiments conducted on
the Agriculture-Vision dataset, we demonstrate that our proposed strategies
improve the performance of the current state-of-the-art method.Comment: CVPR 2022 Workshop - Agriculture Visio
Mask2Anomaly: Mask Transformer for Universal Open-set Segmentation
Segmenting unknown or anomalous object instances is a critical task in
autonomous driving applications, and it is approached traditionally as a
per-pixel classification problem. However, reasoning individually about each
pixel without considering their contextual semantics results in high
uncertainty around the objects' boundaries and numerous false positives. We
propose a paradigm change by shifting from a per-pixel classification to a mask
classification. Our mask-based method, Mask2Anomaly, demonstrates the
feasibility of integrating a mask-classification architecture to jointly
address anomaly segmentation, open-set semantic segmentation, and open-set
panoptic segmentation. Mask2Anomaly includes several technical novelties that
are designed to improve the detection of anomalies/unknown objects: i) a global
masked attention module to focus individually on the foreground and background
regions; ii) a mask contrastive learning that maximizes the margin between an
anomaly and known classes; iii) a mask refinement solution to reduce false
positives; and iv) a novel approach to mine unknown instances based on the
mask-architecture properties. By comprehensive qualitative and qualitative
evaluation, we show Mask2Anomaly achieves new state-of-the-art results across
the benchmarks of anomaly segmentation, open-set semantic segmentation, and
open-set panoptic segmentation.Comment: 16 pages. arXiv admin note: substantial text overlap with
arXiv:2307.1331